Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping

نویسندگان

  • Seungwoo Lee
  • Gary Geunbae Lee
چکیده

One of issues in the bootstrapping for named entity recognition is how to control annotation errors introduced at every iteration. In this paper, we present several heuristics for reducing such errors using external resources such as WordNet, encyclopedia and Web documents. The bootstrapping is applied for identifying and classifying fine-grained geographic named entities, which are useful for applications such as information extraction and question answering, as well as standard named entities such as PERSON and ORGANIZATION. The experiments show the usefulness of the suggested heuristics and the learning curve evaluated at each bootstrapping loop. When our approach was applied to a newspaper corpus, it could achieve 87 F1 value, which is quite promising for the fine-grained named entity recognition task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bootstrapping Approach for Geographic Named Entity Annotation

Geographic named entities can be classified into many subtypes that are useful for applications such as information extraction and question answering. In this paper, we present a bootstrapping algorithm for the task of geographic named entity annotation. In the initial stage, we annotate a raw corpus using seeds. From the initial annotation, boundary patterns are learned and applied to the corp...

متن کامل

Microsoft Word - camera-ready.docx

We explore methods for effectively extracting information from clinical narratives, which are captured in a public health consulting phone service called HealthLink. The currently available data consists of dialogues constructed by nurses while consulting patients on the phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and ...

متن کامل

Bootstrapping Biomedical Ontologies for Scientific Text using NELL

We describe an open information extraction system for biomedical text based on NELL (the Never-Ending Language Learner) (Carlson et al., 2010), a system designed for extraction from Web text. NELL uses a coupled semi-supervised bootstrapping approach to learn new facts from text, given an initial ontology and a small number of “seeds” for each ontology category. In contrast to previous applicat...

متن کامل

Factors Affecting Medication Errors from Nurses' Perspective: Lessons Learned

Introduction: Medical errors are among the most threatening faults against patient’s safety in all countries. The most frequent medical errors are medication errors which can lead to serious effects and even death in patients. Therefore, this study aimed to explain factors affecting medication eroors from the viewpoints of nurses in order to present strategies to reduce these errors. Methods:...

متن کامل

A Bootstrapping Method to Assess Software Impact in Full-Text Papers

Introduction and Motivation There is a concerted effort to study science of science in multiple spheres. However, a clear gap exists in how to incorporate digital outputs, such as software, as an integral component in scholarly communication. This tension has become aggravated in recent years because software can be the end products in many scientific inquiries. Therefore, there is the need to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005